43 research outputs found
Wiretap and Gelfand-Pinsker Channels Analogy and its Applications
An analogy framework between wiretap channels (WTCs) and state-dependent
point-to-point channels with non-causal encoder channel state information
(referred to as Gelfand-Pinker channels (GPCs)) is proposed. A good sequence of
stealth-wiretap codes is shown to induce a good sequence of codes for a
corresponding GPC. Consequently, the framework enables exploiting existing
results for GPCs to produce converse proofs for their wiretap analogs. The
analogy readily extends to multiuser broadcasting scenarios, encompassing
broadcast channels (BCs) with deterministic components, degradation ordering
between users, and BCs with cooperative receivers. Given a wiretap BC (WTBC)
with two receivers and one eavesdropper, an analogous Gelfand-Pinsker BC (GPBC)
is constructed by converting the eavesdropper's observation sequence into a
state sequence with an appropriate product distribution (induced by the
stealth-wiretap code for the WTBC), and non-causally revealing the states to
the encoder. The transition matrix of the state-dependent GPBC is extracted
from WTBC's transition law, with the eavesdropper's output playing the role of
the channel state. Past capacity results for the semi-deterministic (SD) GPBC
and the physically-degraded (PD) GPBC with an informed receiver are leveraged
to furnish analogy-based converse proofs for the analogous WTBC setups. This
characterizes the secrecy-capacity regions of the SD-WTBC and the PD-WTBC, in
which the stronger receiver also observes the eavesdropper's channel output.
These derivations exemplify how the wiretap-GP analogy enables translating
results on one problem into advances in the study of the other
Information Storage in the Stochastic Ising Model
Most information systems store data by modifying the local state of matter,
in the hope that atomic (or sub-atomic) local interactions would stabilize the
state for a sufficiently long time, thereby allowing later recovery. In this
work we initiate the study of information retention in locally-interacting
systems. The evolution in time of the interacting particles is modeled via the
stochastic Ising model (SIM). The initial spin configuration serves as
the user-controlled input. The output configuration is produced by
running steps of the Glauber chain. Our main goal is to evaluate the
information capacity when the time
scales with the size of the system . For the zero-temperature SIM on the
two-dimensional grid and free boundary conditions, it
is easy to show that for . In addition, we show
that on the order of bits can be stored for infinite time in striped
configurations. The achievability is optimal when and
is fixed.
One of the main results of this work is an achievability scheme that stores
more than bits (in orders of magnitude) for superlinear (in )
times. The analysis of the scheme decomposes the system into
independent Z-channels whose crossover probability is found via the (recently
rigorously established) Lifshitz law of phase boundary movement. We also
provide results for the positive but small temperature regime. We show that an
initial configuration drawn according to the Gibbs measure cannot retain more
than a single bit for . On the other hand,
when scaling time with , the stripe-based coding scheme (that stores for
infinite time at zero temperature) is shown to retain its bits for time that is
exponential in
Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation
This paper studies convergence of empirical measures smoothed by a Gaussian
kernel. Specifically, consider approximating , for
, by
, where is the empirical measure,
under different statistical distances. The convergence is examined in terms of
the Wasserstein distance, total variation (TV), Kullback-Leibler (KL)
divergence, and -divergence. We show that the approximation error under
the TV distance and 1-Wasserstein distance () converges at rate
in remarkable contrast to a typical
rate for unsmoothed (and ). For the
KL divergence, squared 2-Wasserstein distance (), and
-divergence, the convergence rate is , but only if
achieves finite input-output mutual information across the additive
white Gaussian noise channel. If the latter condition is not met, the rate
changes to for the KL divergence and , while
the -divergence becomes infinite - a curious dichotomy. As a main
application we consider estimating the differential entropy
in the high-dimensional regime. The distribution
is unknown but i.i.d samples from it are available. We first show that
any good estimator of must have sample complexity
that is exponential in . Using the empirical approximation results we then
show that the absolute-error risk of the plug-in estimator converges at the
parametric rate , thus establishing the minimax
rate-optimality of the plug-in. Numerical results that demonstrate a
significant empirical superiority of the plug-in approach to general-purpose
differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158
Capacity of Continuous Channels with Memory via Directed Information Neural Estimator
Calculating the capacity (with or without feedback) of channels with memory
and continuous alphabets is a challenging task. It requires optimizing the
directed information (DI) rate over all channel input distributions. The
objective is a multi-letter expression, whose analytic solution is only known
for a few specific cases. When no analytic solution is present or the channel
model is unknown, there is no unified framework for calculating or even
approximating capacity. This work proposes a novel capacity estimation
algorithm that treats the channel as a `black-box', both when feedback is or is
not present. The algorithm has two main ingredients: (i) a neural distribution
transformer (NDT) model that shapes a noise variable into the channel input
distribution, which we are able to sample, and (ii) the DI neural estimator
(DINE) that estimates the communication rate of the current NDT model. These
models are trained by an alternating maximization procedure to both estimate
the channel capacity and obtain an NDT for the optimal input distribution. The
method is demonstrated on the moving average additive Gaussian noise channel,
where it is shown that both the capacity and feedback capacity are estimated
without knowledge of the channel transition kernel. The proposed estimation
framework opens the door to a myriad of capacity approximation results for
continuous alphabet channels that were inaccessible until now
Max-Sliced Mutual Information
Quantifying the dependence between high-dimensional random variables is
central to statistical learning and inference. Two classical methods are
canonical correlation analysis (CCA), which identifies maximally correlated
projected versions of the original variables, and Shannon's mutual information,
which is a universal dependence measure that also captures high-order
dependencies. However, CCA only accounts for linear dependence, which may be
insufficient for certain applications, while mutual information is often
infeasible to compute/estimate in high dimensions. This work proposes a middle
ground in the form of a scalable information-theoretic generalization of CCA,
termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual
information between low-dimensional projections of the high-dimensional
variables, which reduces back to CCA in the Gaussian case. It enjoys the best
of both worlds: capturing intricate dependencies in the data while being
amenable to fast computation and scalable estimation from samples. We show that
mSMI retains favorable structural properties of Shannon's mutual information,
like variational forms and identification of independence. We then study
statistical estimation of mSMI, propose an efficiently computable neural
estimator, and couple it with formal non-asymptotic error bounds. We present
experiments that demonstrate the utility of mSMI for several tasks,
encompassing independence testing, multi-view representation learning,
algorithmic fairness, and generative modeling. We observe that mSMI
consistently outperforms competing methods with little-to-no computational
overhead.Comment: Accepted at NeurIPS 202